Detecting Errors in Corpora Using Support Vector Machines
نویسندگان
چکیده
While the corpus-based research relies on human annotated corpora, it is often said that a non-negligible amount of errors remain even in frequently used corpora such as Penn Treebank. Detection of errors in annotated corpora is important for corpus-based natural language processing. In this paper, we propose a method to detect errors in corpora using support vector machines (SVMs). This method is based on the idea of extracting exceptional elements that violate consistency. We propose a method of using SVMs to assign a weight to each element and to find errors in a POS tagged corpus. We apply the method to English and Japanese POS-tagged corpora and achieve high precision in detecting errors.
منابع مشابه
A Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels
The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...
متن کاملFault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملA prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)
Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...
متن کاملSTAGE-DISCHARGE MODELING USING SUPPORT VECTOR MACHINES
Establishment of rating curves are often required by the hydrologists for flow estimates in the streams, rivers etc. Measurement of discharge in a river is a time-consuming, expensive, and difficult process and the conventional approach of regression analysis of stage-discharge relation does not provide encouraging results especially during the floods. P
متن کاملA prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)
Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...
متن کامل